1. Load Packages

source("./Mean Reversion/RMR.001 Load Packages.R") 

2. Load Data

pricing_data <- read_csv("./Mean Reversion/Raw Data/pricing data.csv") 
## Parsed with column specification:
## cols(
##   date_unix = col_integer(),
##   date_time = col_datetime(format = ""),
##   high = col_double(),
##   low = col_double(),
##   open = col_double(),
##   close = col_double(),
##   volume = col_double(),
##   quote_volume = col_double(),
##   weighted_average = col_double(),
##   currency_pair = col_character(),
##   period = col_integer()
## )

3. Prepare Data Function

Description
Spreads Poloneix pricing data into wide format and filters data to a specified time resolution and time window.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
start_date: The start date of the time window.
end_date: The end date of the time window.

prepare_data <- function(pricing_data, time_resolution, start_date, end_date) { 
  df <- pricing_data %>% 
    filter(period == time_resolution, 
           date_time >= start_date, 
           date_time <= end_date) %>% 
    select(date_unix, date_time, close, currency_pair) %>% 
    spread(currency_pair, close) 
  return(df)
} 

4. Test Cointegration Function

Description
The Engle-Granger method is used to test for cointegration. This method is comprised of two steps: (1) Perform a linear regression of log(coin_y) on log(coin_x). (2) Perform an Augmented Dickey-Fuller test on the residuals from the linear regression estimated in (1). The ADF test specification is of a non-zero mean, no time-based trend, and one autoregressive lag. The function returns the ADF test statistic.

Arguments
coin_y: A vector containing the pricing data for the dependent coin in the regression.
coin_x: A vector containing the pricing data for the independent coin in the regression.

test_cointegration <- function(coin_y, coin_x) { 
  lm_model <- lm(log(coin_y) ~ log(coin_x))  
  lm_residuals <- lm_model[["residuals"]] 
  adf_test <- ur.df(lm_residuals, type = "drift", lags = 1) 
  df_stat = adf_test@testreg[["coefficients"]][2, 3]
  return(df_stat) 
} 

5. Create Coin Pairs Function

Description
Two sets of currency pairs are examined: currency pairs where USDT is the quote currency and currency pairs where BTC is the quote currency. All combinations of coins are created within a given quote currency. Combinations that consist of the coin with itself are removed. The function returns a dataframe containing the coin pairs.

Arguments
quote_currency: A string indicating the quote currency of the currency pairs. Can take values USDT or BTC.

create_pairs <- function(quote_currency) { 
  if (quote_currency == "USDT") { 
    coin_list <- c("USDT_BTC", "USDT_DASH", "USDT_ETH", "USDT_LTC", "USDT_REP", "USDT_XMR", "USDT_ZEC")
  } 
  if (quote_currency == "BTC") { 
    coin_list <- c("BTC_DASH", "BTC_ETH", "BTC_LTC", "BTC_REP", "BTC_XEM", "BTC_XMR", "BTC_ZEC")
  } 
  coin_pairs <- expand.grid(coin_list, coin_list) %>% 
    rename(coin_y = Var1, 
           coin_x = Var2) %>% 
    filter(coin_y != coin_x) %>% 
    mutate_if(is.factor, as.character) %>%
    as_tibble() 
  return(coin_pairs)
} 

6. Test Coin Pairs Function

Description
Test for cointegration between each coin pair generated by the create_pairs() function. The test for cointegration is performed by the test_cointegration() function. The function returns a dataframe containing the coin pairs and the ADF test statistic resulting from testing cointegration between each coin pair.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pairs.
coin_pairs: A dataframe generated by create_pairs().

test_pairs <- function(train, coin_pairs) { 
  adf_stat <- c() 
  for (n in 1:nrow(coin_pairs)) { 
    coin_y <- coin_pairs[[n, "coin_y"]] 
    coin_x <- coin_pairs[[n, "coin_x"]] 
    cointegration_results <- test_cointegration(coin_y = train[[coin_y]], coin_x = train[[coin_x]])
    adf_stat <- c(adf_stat, cointegration_results)
  } 
  df <- coin_pairs %>% 
    mutate(adf_stat = adf_stat) %>% 
    arrange(adf_stat)
  return(df) 
} 

7. Select Coin Pairs Function

Description
Select cointegrated coin pairs to be used in a mean reversion strategy. The current coin selection logic is to select all coins where the ADF test statistic is less than -2.57.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
coin_pairs: A dataframe generated by create_pairs().

select_pairs <- function(train, coin_pairs) { 
  set.seed(5) 
  df <- test_pairs(train = train, coin_pairs = coin_pairs) %>% 
    filter(adf_stat <= -3.43)
  return(df) 
} 

8. Generate Signals Function

Description
Generate trading signals that indicate the current position in the spread formed by a linear combination of coin y and coin x. A signal of +1 indicates a long position in the spread, 0 indicates a flat position, and -1 indicates a short position in the spread. Signals are generated for the test set using a model trained on the training set.

The current trading logic is perform a linear regression of log(coin y) on log(coin x) using the training set. A spread is then calculated in the test set using the fitted hedge ratio and intercept from the regression. The z-score of the spread is then calculated using the mean and standard deviation from the training set. A position is entered when the z-score reaches +2 or -2 and is exited when the z-score reaches 0. Also exits losing positions when the z-score reaches +4 or -4 and re-enters the position when when it returns to within the +4 or -4 range.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

generate_signals <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))    
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_signals <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           lag_spread_z = lag(spread_z, 1), 
           signal_long = ifelse(lag_spread_z <=  0 & lag_spread_z > -1, 0.25, 0), 
           signal_long = ifelse(lag_spread_z <= -1 & lag_spread_z > -2, 0.50, signal_long), 
           signal_long = ifelse(lag_spread_z <= -2 & lag_spread_z > -3, 0.75, signal_long), 
           signal_long = ifelse(lag_spread_z <= -3 & lag_spread_z > -4, 1.00, signal_long), 
           signal_long = ifelse(lag_spread_z <= -4, 0, signal_long), 
           signal_short = ifelse(lag_spread_z >= 0 & lag_spread_z < 1, -0.25, 0), 
           signal_short = ifelse(lag_spread_z >= 1 & lag_spread_z < 2, -0.50, signal_short), 
           signal_short = ifelse(lag_spread_z >= 2 & lag_spread_z < 3, -0.75, signal_short), 
           signal_short = ifelse(lag_spread_z >= 3 & lag_spread_z < 4, -1.00, signal_short), 
           signal_short = ifelse(lag_spread_z >= 4, 0, signal_short), 
           signal = signal_long + signal_short, 
           signal = ifelse(is.na(signal), 0, signal)) 
  return(df_signals[["signal"]])
} 

9. Backtest Pair Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using coin y and coin x.

The current backtesting logic uses signals generated by generate_signals(). The coin_y_return and coin_x_return indicate the one period percentage return of each coin. The coin_y_position and coin_x_position indicate the market value in USD in each coin. coin_y_pnl and coin_x_pnl indicate the USD value of the profit and loss for each coin. The combined_position indicates the gross market value of the combined positions.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

backtest_pair <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))   
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_backtest <- test %>% 
    mutate(signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           coin_y_return = test[[coin_y]] / lag(test[[coin_y]], 1) - 1, 
           coin_x_return = test[[coin_x]] / lag(test[[coin_x]], 1) - 1, 
           coin_y_position = signal * 1           *  1, 
           coin_x_position = signal * hedge_ratio * -1,  
           coin_y_pnl = lag(coin_y_position, 1) * coin_y_return, 
           coin_x_pnl = lag(coin_x_position, 1) * coin_x_return, 
           combined_position = abs(coin_y_position) + abs(coin_x_position), 
           combined_pnl = coin_y_pnl + coin_x_pnl, 
           combined_return = combined_pnl / (1 + hedge_ratio)) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .))) %>% 
    mutate(return_pair = cumprod(1 + combined_return)) 
  return(df_backtest[["return_pair"]])
} 

10. Backtest Strategy Function

Description
Calculate the return of a cointegration-based mean reversion trading strategy using an equally weighted portfolio of cointegrated coin pairs.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
selected_pairs: A dataframe generated by select_coins() that represents a set of cointegrated coin pairs.

backtest_strategy <- function(train, test, selected_pairs, threshold_z) { 
  if (nrow(selected_pairs) == 0) { 
    return(1) 
  } 
  df <- tibble()  
  for (i in 1:nrow(selected_pairs)) { 
    single_pair <- tibble(
      return_pair = backtest_pair(train = train, 
                                  test = test, 
                                  coin_y = selected_pairs[["coin_y"]][i], 
                                  coin_x = selected_pairs[["coin_x"]][i], 
                                  threshold_z = threshold_z), 
      coin_y = selected_pairs[["coin_y"]][i], 
      coin_x = selected_pairs[["coin_x"]][i], 
      date_time = test[["date_time"]]
    )
    df <- bind_rows(df, single_pair)
  }
  df <- df %>% 
    group_by(date_time) %>% 
    summarise(return_strategy = mean(return_pair)) 
  return(df[["return_strategy"]])
} 

11. Plot Single Function

Description
Create plots of a cointegration-based mean reversion trading strategy of a single coin pair conprised of coin y and coin x. There are two plots created by this function. The first plot displays the spread transformed into z-score with three red lines at -2, 0, and 2. A green line indicates the signal which can take values -1, 0, and +1. The second plot displays the cumulative return of the model in blue. Two additional lines show the buy and hold return of coin y and coin x as red and green lines, respectively.

Arguments
train: A dataframe generated by prepare_data() that represents the training set for the coin pair.
test: A dataframe generated by prepare_data() that represents the test set for the coin pair.
coin_y: A string indicating the dependent coin in the coin pair regression.
coin_x: A string indicating the independent coin in the coin pair regression.
threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_single <- function(train, test, coin_y, coin_x, threshold_z) { 
  model <- lm(log(train[[coin_y]]) ~ log(train[[coin_x]]))   
  intercept <- coef(model)[1] 
  hedge_ratio <- coef(model)[2] 
  df_plot <- test %>% 
    mutate(spread = log(test[[coin_y]]) - log(test[[coin_x]]) * hedge_ratio - intercept, 
           spread_z = (spread - mean(model[["residuals"]])) / sd(model[["residuals"]]), 
           signal = generate_signals(train = train, 
                                     test = test, 
                                     coin_y = coin_y, 
                                     coin_x = coin_x, 
                                     threshold_z = threshold_z), 
           return_pair = backtest_pair(train = train, 
                                       test = test, 
                                       coin_y = coin_y, 
                                       coin_x = coin_x, 
                                       threshold_z = threshold_z), 
           return_buyhold_y = test[[coin_y]] / test[[coin_y]][1], 
           return_buyhold_x = test[[coin_x]] / test[[coin_x]][1])
  print(summary(model)) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = spread_z, colour = "Spread Z"), size = 1) + 
          geom_line(aes(y = signal, colour = "Signal"), size = 0.5) + 
          geom_hline(yintercept = 0, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = 2, colour = "red", alpha = 0.5) + 
          geom_hline(yintercept = -2, colour = "red", alpha = 0.5) + 
          scale_color_manual(name = "Series", 
                             values = c("Spread Z" = "blue", 
                                        "Signal" = "green")) + 
          labs(title = "Spread vs Trading Signal", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Spread and Signal")) 
  print(ggplot(df_plot, aes(x = date_time)) + 
          geom_line(aes(y = return_pair, colour = "Model"), size = 1) + 
          geom_line(aes(y = return_buyhold_y, colour = "Coin Y"), size = 0.5, alpha = 0.4) + 
          geom_line(aes(y = return_buyhold_x, colour = "Coin X"), size = 0.5, alpha = 0.4) + 
          geom_hline(yintercept = 1, colour = "black") + 
          scale_color_manual(name = "Return", 
                             values = c("Model" = "darkblue", 
                                        "Coin Y" = "darkred", 
                                        "Coin X" = "darkgreen")) + 
          labs(title = "Model Return vs Buy Hold Return", 
               subtitle = str_c(coin_y, " and ", coin_x), 
               x = "Date", 
               y = "Cumulative Return"))
} 

12. Plot Many Function

Description
Create many plots by calling the plot_single() function multiple times. Also creates a plot showing the results of the overall strategy. Creates a train and test set surrounding a cutoff date and creates plot for the top 10 selected coins ranked by their ADF statistic.

Arguments
pricing_data: A dataframe containing pricing data from Poloneix gathered in tidy format.
time_resolution: The number of seconds that each observation spans. Takes values 300, 900, 1800, 7200, 14400, and 86400.
cutoff_date: A data representing the cutoff date between the train and test sets.
train_window: A period object from the lubridate package representing the length of time the train set covers.
test_window: A period object from lubridate package representing the length of time the the test set covers. threshold_z: A number indicating the absolute value of the z-score threshold for entering a position in the spread.

plot_many <- function(pricing_data, time_resolution, cutoff_date, train_window, test_window, threshold_z) { 
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = as.Date(cutoff_date) - train_window, 
                        end_date = as.Date(cutoff_date)) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = as.Date(cutoff_date), 
                       end_date = as.Date(cutoff_date) + test_window) 
  selected_pairs <- select_pairs(train = train, 
                                 coin_pairs = create_pairs(quote_currency = quote_currency))
  if (nrow(selected_pairs) == 0) { 
    return("No coin pairs selected.")
  } 
  print(selected_pairs) 
  for (i in 1:min(10, nrow(selected_pairs))) { 
    plot_single(train = train, 
                test = test, 
                coin_y = selected_pairs[["coin_y"]][i], 
                coin_x = selected_pairs[["coin_x"]][i], 
                threshold_z = threshold_z)
  } 
  test <- test %>% 
    mutate(return_strategy = backtest_strategy(train = train, 
                                               test = ., 
                                               selected_pairs = selected_pairs, 
                                               threshold_z = threshold_z)) 
  ggplot(test, aes(x = date_time)) + 
    geom_line(aes(y = return_strategy, colour = "Strategy"), size = 1) + 
    geom_line(aes(y = USDT_BTC / USDT_BTC[1], colour = "USDT_BTC"), size = 0.5, alpha = 0.4) + 
    geom_hline(yintercept = 1, colour = "black") + 
    scale_color_manual(name = "Return", 
                       values = c("Strategy" = "darkblue", 
                                  "USDT_BTC" = "darkred")) + 
    labs(title = "Strategy Return vs Buy Hold Return", 
         x = "Date", 
         y = "Cumulative Return") 
} 

13. Set Parameters

quote_currency <- "USDT" 
time_resolution <- 900
train_window <- days(32) 
test_window <- days(16) 
test_by <- "16 days"
threshold_z <- 2 

14. Cross Validation September 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-09-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 3 x 3
##     coin_y   coin_x  adf_stat
##      <chr>    <chr>     <dbl>
## 1 USDT_REP USDT_ZEC -5.340746
## 2 USDT_ZEC USDT_REP -5.282765
## 3 USDT_REP USDT_ETH -3.471493
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.189890 -0.036778 -0.000273  0.030813  0.247341 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.584037   0.042463  -60.85 <0.0000000000000002 ***
## log(train[[coin_x]])  1.043631   0.007849  132.96 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05619 on 3071 degrees of freedom
## Multiple R-squared:  0.852,  Adjusted R-squared:  0.8519 
## F-statistic: 1.768e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.183978 -0.032501 -0.006733  0.033032  0.166360 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           2.90999    0.01881   154.7 <0.0000000000000002 ***
## log(train[[coin_x]])  0.81637    0.00614   133.0 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0497 on 3071 degrees of freedom
## Multiple R-squared:  0.852,  Adjusted R-squared:  0.8519 
## F-statistic: 1.768e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.13510 -0.04964 -0.01425  0.03677  0.28900 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.310058   0.046970  -27.89 <0.0000000000000002 ***
## log(train[[coin_x]])  0.770269   0.008275   93.08 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07472 on 3071 degrees of freedom
## Multiple R-squared:  0.7383, Adjusted R-squared:  0.7382 
## F-statistic:  8664 on 1 and 3071 DF,  p-value: < 0.00000000000000022

15. Cross Validation August 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-08-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 8 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
## 1  USDT_ETH  USDT_ZEC -4.462054
## 2  USDT_ZEC  USDT_REP -4.361885
## 3  USDT_REP  USDT_ZEC -4.335836
## 4  USDT_ZEC  USDT_ETH -4.290393
## 5 USDT_DASH  USDT_XMR -4.016416
## 6  USDT_XMR USDT_DASH -4.000072
## 7  USDT_ETH  USDT_REP -3.695401
## 8  USDT_REP  USDT_ETH -3.464286
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.179487 -0.021586  0.003705  0.026061  0.106185 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.410750   0.018182   77.59 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.742691   0.003393  218.91 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03676 on 3071 degrees of freedom
## Multiple R-squared:  0.9398, Adjusted R-squared:  0.9398 
## F-statistic: 4.792e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.237564 -0.026330 -0.000836  0.041292  0.252928 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.153572   0.020108   107.1 <0.0000000000000002 ***
## log(train[[coin_x]]) 1.058572   0.006636   159.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06416 on 3071 degrees of freedom
## Multiple R-squared:  0.8923, Adjusted R-squared:  0.8923 
## F-statistic: 2.545e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.237726 -0.036235 -0.005994  0.034868  0.195045 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.489552   0.028320   -52.6 <0.0000000000000002 ***
## log(train[[coin_x]])  0.842935   0.005284   159.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05726 on 3071 degrees of freedom
## Multiple R-squared:  0.8923, Adjusted R-squared:  0.8923 
## F-statistic: 2.545e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.133493 -0.039498 -0.001783  0.034247  0.205442 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.46257    0.03116  -46.94 <0.0000000000000002 ***
## log(train[[coin_x]])  1.26537    0.00578  218.91 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04798 on 3071 degrees of freedom
## Multiple R-squared:  0.9398, Adjusted R-squared:  0.9398 
## F-statistic: 4.792e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.108765 -0.034745  0.000672  0.029316  0.158594 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.909297   0.030249   63.12 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.883103   0.008161  108.21 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05058 on 3071 degrees of freedom
## Multiple R-squared:  0.7922, Adjusted R-squared:  0.7922 
## F-statistic: 1.171e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.164998 -0.036084 -0.006958  0.042302  0.116351 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.94302    0.04296  -21.95 <0.0000000000000002 ***
## log(train[[coin_x]])  0.89708    0.00829  108.21 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05098 on 3071 degrees of freedom
## Multiple R-squared:  0.7922, Adjusted R-squared:  0.7922 
## F-statistic: 1.171e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20745 -0.02401  0.01148  0.03494  0.16101 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          3.017909   0.019176   157.4 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.783639   0.006329   123.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06119 on 3071 degrees of freedom
## Multiple R-squared:  0.8331, Adjusted R-squared:  0.8331 
## F-statistic: 1.533e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20918 -0.04682 -0.01119  0.05946  0.19466 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.703749   0.046282  -58.42 <0.0000000000000002 ***
## log(train[[coin_x]])  1.063160   0.008586  123.83 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07127 on 3071 degrees of freedom
## Multiple R-squared:  0.8331, Adjusted R-squared:  0.8331 
## F-statistic: 1.533e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

16. Cross Validation July 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-07-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 8 x 3
##      coin_y   coin_x  adf_stat
##       <chr>    <chr>     <dbl>
## 1  USDT_REP USDT_XMR -5.734962
## 2  USDT_XMR USDT_REP -5.685226
## 3  USDT_BTC USDT_REP -4.539502
## 4  USDT_REP USDT_BTC -4.485586
## 5  USDT_BTC USDT_XMR -4.190621
## 6  USDT_XMR USDT_BTC -4.063350
## 7 USDT_DASH USDT_ZEC -3.551029
## 8 USDT_DASH USDT_LTC -3.535247
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.210508 -0.031649 -0.000937  0.030277  0.169426 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.61487    0.03706  -43.58 <0.0000000000000002 ***
## log(train[[coin_x]])  1.28768    0.00960  134.14 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0562 on 3071 degrees of freedom
## Multiple R-squared:  0.8542, Adjusted R-squared:  0.8542 
## F-statistic: 1.799e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.123985 -0.019981  0.002444  0.026076  0.180642 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.633839   0.016603   98.41 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.663368   0.004945  134.14 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04034 on 3071 degrees of freedom
## Multiple R-squared:  0.8542, Adjusted R-squared:  0.8542 
## F-statistic: 1.799e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.124641 -0.021271  0.004241  0.020682  0.085012 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          6.264116   0.013807   453.7 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.469285   0.004113   114.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03355 on 3071 degrees of freedom
## Multiple R-squared:  0.8092, Adjusted R-squared:  0.8091 
## F-statistic: 1.302e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14718 -0.04983 -0.01338  0.05001  0.20053 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -10.16068    0.11844  -85.78 <0.0000000000000002 ***
## log(train[[coin_x]])   1.72423    0.01511  114.11 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0643 on 3071 degrees of freedom
## Multiple R-squared:  0.8092, Adjusted R-squared:  0.8091 
## F-statistic: 1.302e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.089384 -0.026172  0.003299  0.022341  0.085533 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          5.286158   0.021008   251.6 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.661334   0.005442   121.5 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03186 on 3071 degrees of freedom
## Multiple R-squared:  0.8278, Adjusted R-squared:  0.8278 
## F-statistic: 1.477e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.11391 -0.03020  0.00318  0.03407  0.14419 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -5.95274    0.08074  -73.72 <0.0000000000000002 ***
## log(train[[coin_x]])  1.25177    0.01030  121.52 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04384 on 3071 degrees of freedom
## Multiple R-squared:  0.8278, Adjusted R-squared:  0.8278 
## F-statistic: 1.477e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.192122 -0.032241  0.007943  0.041034  0.196809 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.629257   0.035553   45.83 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.601409   0.006213   96.80 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06764 on 3071 degrees of freedom
## Multiple R-squared:  0.7532, Adjusted R-squared:  0.7531 
## F-statistic:  9371 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21086 -0.03903 -0.01176  0.02116  0.24807 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          3.196284   0.021880  146.08 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.533891   0.006227   85.74 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0739 on 3071 degrees of freedom
## Multiple R-squared:  0.7054, Adjusted R-squared:  0.7053 
## F-statistic:  7352 on 1 and 3071 DF,  p-value: < 0.00000000000000022

17. Cross Validation June 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-06-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 14 x 3
##       coin_y    coin_x  adf_stat
##        <chr>     <chr>     <dbl>
##  1  USDT_REP USDT_DASH -6.302270
##  2 USDT_DASH  USDT_REP -6.164980
##  3 USDT_DASH  USDT_ZEC -4.838517
##  4  USDT_REP  USDT_ZEC -4.765335
##  5  USDT_ZEC USDT_DASH -4.419407
##  6 USDT_DASH  USDT_XMR -4.338766
##  7  USDT_REP  USDT_XMR -4.334303
##  8  USDT_XMR USDT_DASH -4.299602
##  9  USDT_ZEC  USDT_REP -4.132672
## 10  USDT_XMR  USDT_REP -4.062863
## 11  USDT_XMR  USDT_ZEC -3.974349
## 12 USDT_DASH  USDT_ETH -3.746119
## 13  USDT_XMR  USDT_ETH -3.539918
## 14  USDT_ZEC  USDT_XMR -3.518425
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32075 -0.04357  0.00580  0.04444  0.25546 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.093575   0.040614  -51.55 <0.0000000000000002 ***
## log(train[[coin_x]])  1.085528   0.008785  123.57 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06478 on 3071 degrees of freedom
## Multiple R-squared:  0.8325, Adjusted R-squared:  0.8325 
## F-statistic: 1.527e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.246518 -0.040099 -0.004599  0.032748  0.293567 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.379505   0.018168   131.0 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.766951   0.006207   123.6 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05445 on 3071 degrees of freedom
## Multiple R-squared:  0.8325, Adjusted R-squared:  0.8325 
## F-statistic: 1.527e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31848 -0.04139  0.00042  0.04379  0.19210 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.981247   0.015789   188.8 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.339406   0.003259   104.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06251 on 3071 degrees of freedom
## Multiple R-squared:  0.7793, Adjusted R-squared:  0.7792 
## F-statistic: 1.084e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.279354 -0.047233  0.001543  0.046953  0.281885 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.991651   0.019443   51.00 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.399687   0.004014   99.58 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07698 on 3071 degrees of freedom
## Multiple R-squared:  0.7635, Adjusted R-squared:  0.7635 
## F-statistic:  9916 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.47825 -0.11628 -0.02548  0.06762  0.74612 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -5.77880    0.10194  -56.69 <0.0000000000000002 ***
## log(train[[coin_x]])  2.29608    0.02205  104.14 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1626 on 3071 degrees of freedom
## Multiple R-squared:  0.7793, Adjusted R-squared:  0.7792 
## F-statistic: 1.084e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24382 -0.05767 -0.01469  0.05710  0.21907 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.490431   0.023475  106.09 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.615545   0.006772   90.89 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06927 on 3071 degrees of freedom
## Multiple R-squared:  0.729,  Adjusted R-squared:  0.7289 
## F-statistic:  8262 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17542 -0.06927 -0.02101  0.08183  0.36063 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.483450   0.030581   15.81 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.704709   0.008822   79.88 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09023 on 3071 degrees of freedom
## Multiple R-squared:  0.6751, Adjusted R-squared:  0.675 
## F-statistic:  6381 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26796 -0.06164  0.02596  0.07016  0.26309 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -2.01152    0.06024  -33.39 <0.0000000000000002 ***
## log(train[[coin_x]])  1.18435    0.01303   90.89 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09608 on 3071 degrees of freedom
## Multiple R-squared:  0.729,  Adjusted R-squared:  0.7289 
## F-statistic:  8262 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39648 -0.11677 -0.03566  0.06922  0.58622 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.75186    0.05615  -13.39 <0.0000000000000002 ***
## log(train[[coin_x]])  1.91035    0.01918   99.58 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1683 on 3071 degrees of freedom
## Multiple R-squared:  0.7635, Adjusted R-squared:  0.7635 
## F-statistic:  9916 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32131 -0.04028  0.01142  0.07792  0.22094 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.66155    0.03510   18.85 <0.0000000000000002 ***
## log(train[[coin_x]])  0.95798    0.01199   79.88 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1052 on 3071 degrees of freedom
## Multiple R-squared:  0.6751, Adjusted R-squared:  0.675 
## F-statistic:  6381 on 1 and 3071 DF,  p-value: < 0.00000000000000022

18. Cross Validation May 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-05-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 23 x 3
##       coin_y    coin_x  adf_stat
##        <chr>     <chr>     <dbl>
##  1  USDT_LTC USDT_DASH -5.247891
##  2  USDT_LTC  USDT_BTC -5.055356
##  3  USDT_REP  USDT_ETH -4.910632
##  4 USDT_DASH  USDT_ZEC -4.609111
##  5 USDT_DASH  USDT_ETH -4.531939
##  6  USDT_LTC  USDT_ETH -4.516577
##  7  USDT_XMR  USDT_ZEC -4.484302
##  8  USDT_LTC  USDT_ZEC -4.362703
##  9 USDT_DASH  USDT_REP -4.351660
## 10  USDT_REP USDT_DASH -4.303321
## # ... with 13 more rows
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.16346 -0.05209  0.02656  0.10919  0.36281 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.18276    0.14881  -21.39 <0.0000000000000002 ***
## log(train[[coin_x]])  1.30622    0.03459   37.76 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2275 on 3071 degrees of freedom
## Multiple R-squared:  0.3171, Adjusted R-squared:  0.3168 
## F-statistic:  1426 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49274 -0.07323  0.00271  0.08408  0.36510 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -18.32546    0.15969  -114.8 <0.0000000000000002 ***
## log(train[[coin_x]])   2.91569    0.02243   130.0 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.108 on 3071 degrees of freedom
## Multiple R-squared:  0.8462, Adjusted R-squared:  0.8462 
## F-statistic: 1.69e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.294547 -0.034051 -0.002455  0.043462  0.299088 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.151332   0.036864  -31.23 <0.0000000000000002 ***
## log(train[[coin_x]])  0.925974   0.009372   98.80 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07627 on 3071 degrees of freedom
## Multiple R-squared:  0.7607, Adjusted R-squared:  0.7606 
## F-statistic:  9762 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40903 -0.02999  0.00315  0.02929  0.21342 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.294081   0.036772   35.19 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.708576   0.008663   81.79 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06657 on 3071 degrees of freedom
## Multiple R-squared:  0.6854, Adjusted R-squared:  0.6853 
## F-statistic:  6690 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44208 -0.03115  0.01506  0.03302  0.13137 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.540921   0.028427   54.20 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.701951   0.007227   97.13 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05881 on 3071 degrees of freedom
## Multiple R-squared:  0.7544, Adjusted R-squared:  0.7543 
## F-statistic:  9434 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01640 -0.04937  0.02866  0.12096  0.38388 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           -1.7025     0.1101  -15.46 <0.0000000000000002 ***
## log(train[[coin_x]])   1.0524     0.0280   37.59 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2278 on 3071 degrees of freedom
## Multiple R-squared:  0.3151, Adjusted R-squared:  0.3149 
## F-statistic:  1413 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.078881 -0.020784 -0.003911  0.014606  0.092846 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.506220   0.017556   85.80 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.366222   0.004136   88.54 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03178 on 3071 degrees of freedom
## Multiple R-squared:  0.7185, Adjusted R-squared:  0.7184 
## F-statistic:  7840 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.84199 -0.09483 -0.00751  0.14060  0.37788 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -3.69720    0.10427  -35.46 <0.0000000000000002 ***
## log(train[[coin_x]])  1.44527    0.02457   58.83 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1888 on 3071 degrees of freedom
## Multiple R-squared:  0.5299, Adjusted R-squared:  0.5297 
## F-statistic:  3461 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48670 -0.04195  0.01011  0.03820  0.28170 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           2.91496    0.02336  124.78 <0.0000000000000002 ***
## log(train[[coin_x]])  0.55663    0.00937   59.41 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08096 on 3071 degrees of freedom
## Multiple R-squared:  0.5347, Adjusted R-squared:  0.5346 
## F-statistic:  3529 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39145 -0.06261  0.00097  0.06899  0.46005 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -1.64230    0.06956  -23.61 <0.0000000000000002 ***
## log(train[[coin_x]])  0.96061    0.01617   59.41 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1064 on 3071 degrees of freedom
## Multiple R-squared:  0.5347, Adjusted R-squared:  0.5346 
## F-statistic:  3529 on 1 and 3071 DF,  p-value: < 0.00000000000000022

19. Cross Validation April 2017

plot_many(pricing_data = pricing_data,
          time_resolution = time_resolution,
          cutoff_date = "2017-04-01",
          train_window = train_window,
          test_window = test_window,
          threshold_z = threshold_z)
## # A tibble: 6 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
## 1  USDT_BTC  USDT_ZEC -3.921811
## 2  USDT_ZEC  USDT_BTC -3.894445
## 3 USDT_DASH  USDT_XMR -3.786044
## 4  USDT_XMR USDT_DASH -3.627830
## 5  USDT_XMR  USDT_ZEC -3.562814
## 6  USDT_ZEC  USDT_XMR -3.508866
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20449 -0.03945  0.00446  0.04611  0.09902 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           8.306614   0.014881  558.21 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.327595   0.003808  -86.04 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05214 on 3071 degrees of freedom
## Multiple R-squared:  0.7068, Adjusted R-squared:  0.7067 
## F-statistic:  7402 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56203 -0.10207  0.02872  0.09563  0.36529 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          19.06503    0.17627  108.16 <0.0000000000000002 ***
## log(train[[coin_x]]) -2.15748    0.02508  -86.04 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1338 on 3071 degrees of freedom
## Multiple R-squared:  0.7068, Adjusted R-squared:  0.7067 
## F-statistic:  7402 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37070 -0.09691  0.01128  0.08815  0.39283 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          -0.27203    0.03359  -8.098 0.000000000000000798 ***
## log(train[[coin_x]])  1.58285    0.01176 134.608 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1393 on 3071 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.855 
## F-statistic: 1.812e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.261112 -0.044455  0.003694  0.058712  0.253697 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.559809   0.017068    32.8 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.540213   0.004013   134.6 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08139 on 3071 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.855 
## F-statistic: 1.812e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.25227 -0.05376 -0.01961  0.03641  0.39876 
## 
## Coefficients:
##                       Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          -0.100768   0.029653  -3.398             0.000687 ***
## log(train[[coin_x]])  0.756217   0.007587  99.669 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1039 on 3071 degrees of freedom
## Multiple R-squared:  0.7639, Adjusted R-squared:  0.7638 
## F-statistic:  9934 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41215 -0.07308  0.04379  0.08637  0.35360 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.02284    0.02895   35.33 <0.0000000000000002 ***
## log(train[[coin_x]])  1.01010    0.01013   99.67 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1201 on 3071 degrees of freedom
## Multiple R-squared:  0.7639, Adjusted R-squared:  0.7638 
## F-statistic:  9934 on 1 and 3071 DF,  p-value: < 0.00000000000000022

20. Cross Validation March 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-03-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 12 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
##  1 USDT_LTC  USDT_ZEC -4.204591
##  2 USDT_XMR  USDT_BTC -4.108816
##  3 USDT_XMR  USDT_LTC -4.039288
##  4 USDT_LTC  USDT_REP -3.989227
##  5 USDT_XMR USDT_DASH -3.973108
##  6 USDT_LTC  USDT_XMR -3.948913
##  7 USDT_XMR  USDT_ETH -3.862021
##  8 USDT_LTC  USDT_ETH -3.852363
##  9 USDT_XMR  USDT_ZEC -3.813835
## 10 USDT_LTC USDT_DASH -3.712336
## 11 USDT_XMR  USDT_REP -3.697045
## 12 USDT_LTC  USDT_BTC -3.596645
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.073839 -0.017634  0.003176  0.016558  0.048787 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.954515   0.011342   84.16 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.114562   0.003222   35.55 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01962 on 3071 degrees of freedom
## Multiple R-squared:  0.2916, Adjusted R-squared:  0.2914 
## F-statistic:  1264 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.107717 -0.027904 -0.005535  0.026504  0.087241 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           3.597869   0.058801   61.19 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.153152   0.008464  -18.09 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03564 on 3071 degrees of freedom
## Multiple R-squared:  0.09634,    Adjusted R-squared:  0.09604 
## F-statistic: 327.4 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.10417 -0.02179 -0.00559  0.01850  0.09753 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.76882    0.03691   47.92 <0.0000000000000002 ***
## log(train[[coin_x]])  0.56361    0.02718   20.73 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03511 on 3071 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1225 
## F-statistic: 429.9 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.091934 -0.013630  0.004628  0.015181  0.061119 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.511292   0.007737  195.34 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.101738   0.005114  -19.89 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02194 on 3071 degrees of freedom
## Multiple R-squared:  0.1142, Adjusted R-squared:  0.1139 
## F-statistic: 395.7 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.102551 -0.027109 -0.005734  0.027462  0.091790 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           2.692115   0.009553  281.80 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.053538   0.003227  -16.59 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03591 on 3071 degrees of freedom
## Multiple R-squared:  0.08226,    Adjusted R-squared:  0.08196 
## F-statistic: 275.3 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.065242 -0.016374 -0.002281  0.017201  0.067778 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.80554    0.02663   30.25 <0.0000000000000002 ***
## log(train[[coin_x]])  0.21786    0.01051   20.73 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02183 on 3071 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1225 
## F-statistic: 429.9 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.100185 -0.029629 -0.004341  0.026739  0.096462 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           2.702994   0.015659   172.6 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.068144   0.006308   -10.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0368 on 3071 degrees of freedom
## Multiple R-squared:  0.03661,    Adjusted R-squared:  0.0363 
## F-statistic: 116.7 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.087069 -0.016695  0.004296  0.015469  0.061586 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.495439   0.009601  155.75 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.055577   0.003868  -14.37 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02256 on 3071 degrees of freedom
## Multiple R-squared:  0.063,  Adjusted R-squared:  0.0627 
## F-statistic: 206.5 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.090643 -0.025600 -0.005485  0.027094  0.095963 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          2.262317   0.021113  107.15 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.077211   0.005998   12.87 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03652 on 3071 degrees of freedom
## Multiple R-squared:  0.0512, Adjusted R-squared:  0.05089 
## F-statistic: 165.7 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08378 -0.01672  0.00189  0.01633  0.06534 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.410630   0.006125 230.291 <0.0000000000000002 ***
## log(train[[coin_x]]) -0.017956   0.002069  -8.679 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02303 on 3071 degrees of freedom
## Multiple R-squared:  0.02394,    Adjusted R-squared:  0.02362 
## F-statistic: 75.32 on 1 and 3071 DF,  p-value: < 0.00000000000000022

21. Cross Validation February 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-02-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 7 x 3
##     coin_y    coin_x  adf_stat
##      <chr>     <chr>     <dbl>
## 1 USDT_REP  USDT_ETH -4.735592
## 2 USDT_REP USDT_DASH -4.488778
## 3 USDT_REP  USDT_BTC -4.291817
## 4 USDT_REP  USDT_XMR -4.282674
## 5 USDT_REP  USDT_ZEC -4.263626
## 6 USDT_REP  USDT_LTC -4.182799
## 7 USDT_LTC  USDT_XMR -3.660178
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.152349 -0.034262 -0.004368  0.026838  0.206739 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.24802    0.02643   9.384 <0.0000000000000002 ***
## log(train[[coin_x]])  0.52599    0.01147  45.850 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05123 on 3071 degrees of freedom
## Multiple R-squared:  0.4064, Adjusted R-squared:  0.4062 
## F-statistic:  2102 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.11249 -0.03994 -0.00350  0.02961  0.23109 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.614560   0.023326   26.35 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.324103   0.008944   36.24 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05565 on 3071 degrees of freedom
## Multiple R-squared:  0.2995, Adjusted R-squared:  0.2993 
## F-statistic:  1313 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.187670 -0.025774  0.001334  0.040564  0.195610 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          -0.53935    0.10271  -5.251          0.000000161 ***
## log(train[[coin_x]])  0.29335    0.01508  19.458 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06273 on 3071 degrees of freedom
## Multiple R-squared:  0.1098, Adjusted R-squared:  0.1095 
## F-statistic: 378.6 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.171569 -0.027241  0.006699  0.043803  0.207157 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           1.04220    0.02432   42.86 <0.0000000000000002 ***
## log(train[[coin_x]])  0.16390    0.00955   17.16 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06351 on 3071 degrees of freedom
## Multiple R-squared:  0.08752,    Adjusted R-squared:  0.08722 
## F-statistic: 294.5 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.176949 -0.029198  0.006601  0.040602  0.231768 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.90290    0.05374   16.80 <0.0000000000000002 ***
## log(train[[coin_x]])  0.14702    0.01420   10.35 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06536 on 3071 degrees of freedom
## Multiple R-squared:  0.03372,    Adjusted R-squared:  0.0334 
## F-statistic: 107.2 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.185246 -0.029849 -0.003585  0.038725  0.276165 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           1.61473    0.02559  63.109 < 0.0000000000000002 ***
## log(train[[coin_x]]) -0.11182    0.01836  -6.091        0.00000000126 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06609 on 3071 degrees of freedom
## Multiple R-squared:  0.01194,    Adjusted R-squared:  0.01161 
## F-statistic:  37.1 on 1 and 3071 DF,  p-value: 0.000000001262

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.108888 -0.029757 -0.009282  0.029736  0.122705 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.323961   0.015689   20.65 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.420028   0.006162   68.16 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04098 on 3071 degrees of freedom
## Multiple R-squared:  0.6021, Adjusted R-squared:  0.6019 
## F-statistic:  4646 on 1 and 3071 DF,  p-value: < 0.00000000000000022

22. Cross Validation January 2017

plot_many(pricing_data = pricing_data, 
          time_resolution = time_resolution, 
          cutoff_date = "2017-01-01", 
          train_window = train_window, 
          test_window = test_window, 
          threshold_z = threshold_z) 
## # A tibble: 10 x 3
##      coin_y    coin_x  adf_stat
##       <chr>     <chr>     <dbl>
##  1 USDT_REP  USDT_ETH -5.179556
##  2 USDT_REP  USDT_XMR -4.969760
##  3 USDT_REP  USDT_ZEC -4.911000
##  4 USDT_REP  USDT_LTC -4.743256
##  5 USDT_REP  USDT_BTC -4.470370
##  6 USDT_REP USDT_DASH -4.430406
##  7 USDT_ETH  USDT_REP -3.687047
##  8 USDT_LTC  USDT_XMR -3.637524
##  9 USDT_BTC  USDT_XMR -3.571980
## 10 USDT_XMR  USDT_BTC -3.476249
## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40720 -0.06478 -0.01449  0.05821  0.26591 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -0.48791    0.05090  -9.586 <0.0000000000000002 ***
## log(train[[coin_x]])  0.79247    0.02474  32.033 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08453 on 3071 degrees of freedom
## Multiple R-squared:  0.2505, Adjusted R-squared:  0.2502 
## F-statistic:  1026 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51019 -0.06005  0.00946  0.06368  0.24738 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.682004   0.021230   32.12 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.210199   0.009678   21.72 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09091 on 3071 degrees of freedom
## Multiple R-squared:  0.1332, Adjusted R-squared:  0.1329 
## F-statistic: 471.7 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56757 -0.02793  0.00237  0.04551  0.27726 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.112144   0.035064   3.198              0.0014 ** 
## log(train[[coin_x]]) 0.266615   0.009071  29.392 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08626 on 3071 degrees of freedom
## Multiple R-squared:  0.2195, Adjusted R-squared:  0.2193 
## F-statistic: 863.9 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50754 -0.05555  0.01447  0.04996  0.23690 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)           0.60732    0.02809   21.62 <0.0000000000000002 ***
## log(train[[coin_x]])  0.39541    0.02075   19.06 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09233 on 3071 degrees of freedom
## Multiple R-squared:  0.1058, Adjusted R-squared:  0.1055 
## F-statistic: 363.2 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53972 -0.06862 -0.00017  0.06507  0.26323 
## 
## Coefficients:
##                      Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)           0.39424    0.13783   2.860      0.00426 ** 
## log(train[[coin_x]])  0.11149    0.02056   5.424 0.0000000629 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09718 on 3071 degrees of freedom
## Multiple R-squared:  0.009488,   Adjusted R-squared:  0.009165 
## F-statistic: 29.42 on 1 and 3071 DF,  p-value: 0.00000006294

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53979 -0.07294 -0.00371  0.06712  0.26543 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           0.95111    0.05238  18.157 < 0.0000000000000002 ***
## log(train[[coin_x]])  0.08477    0.02328   3.641             0.000276 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09743 on 3071 degrees of freedom
## Multiple R-squared:  0.004298,   Adjusted R-squared:  0.003974 
## F-statistic: 13.26 on 1 and 3071 DF,  p-value: 0.0002761

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.215765 -0.031801  0.007254  0.038373  0.131981 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          1.695565   0.011305  149.98 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.316041   0.009866   32.03 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05338 on 3071 degrees of freedom
## Multiple R-squared:  0.2505, Adjusted R-squared:  0.2502 
## F-statistic:  1026 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.097006 -0.032956 -0.009391  0.028709  0.156896 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          0.469397   0.009839   47.71 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.403345   0.004485   89.92 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04213 on 3071 degrees of freedom
## Multiple R-squared:  0.7248, Adjusted R-squared:  0.7247 
## F-statistic:  8086 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.114979 -0.018137 -0.005200  0.009541  0.093811 
## 
## Coefficients:
##                      Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          5.686288   0.007575   750.7 <0.0000000000000002 ***
## log(train[[coin_x]]) 0.465448   0.003453   134.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03244 on 3071 degrees of freedom
## Multiple R-squared:  0.8554, Adjusted R-squared:  0.8554 
## F-statistic: 1.817e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

## 
## Call:
## lm(formula = log(train[[coin_y]]) ~ log(train[[coin_x]]))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.181951 -0.033320 -0.000486  0.032685  0.215079 
## 
## Coefficients:
##                       Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)          -10.13423    0.09142  -110.9 <0.0000000000000002 ***
## log(train[[coin_x]])   1.83783    0.01363   134.8 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06445 on 3071 degrees of freedom
## Multiple R-squared:  0.8554, Adjusted R-squared:  0.8554 
## F-statistic: 1.817e+04 on 1 and 3071 DF,  p-value: < 0.00000000000000022

23. Cross Validation Full

cutoff_dates <- seq(ymd("2017-01-01"), ymd("2017-10-01"), by = test_by)
results <- tibble() 
for (cutoff_date in cutoff_dates) { 
  cutoff_date <- as.Date(cutoff_date) 
  print(str_c("Cross validating strategy."))
  print(str_c("Using train set from ", cutoff_date - train_window , " to ", cutoff_date, ".")) 
  print(str_c("Using test set from ", cutoff_date, " to ", cutoff_date + test_window, "."))  
  train <- prepare_data(pricing_data = pricing_data, 
                        time_resolution = time_resolution, 
                        start_date = cutoff_date - train_window, 
                        end_date = cutoff_date) 
  test <- prepare_data(pricing_data = pricing_data, 
                       time_resolution = time_resolution, 
                       start_date = cutoff_date, 
                       end_date = cutoff_date + test_window) 
  test <- test %>% 
    mutate(return_strategy = 
             backtest_strategy(train = train, 
                               test = test, 
                               selected_pairs = select_pairs(train = train, coin_pairs = create_pairs(quote_currency = quote_currency)), 
                               threshold_z = threshold_z), 
           return_strategy_change = return_strategy / lag(return_strategy, 1) - 1) %>% 
    mutate_all(funs(ifelse(is.na(.), 0, .)))
  results <- bind_rows(results, test) 
} 
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-11-30 to 2017-01-01."
## [1] "Using test set from 2017-01-01 to 2017-01-17."
## [1] "Cross validating strategy."
## [1] "Using train set from 2016-12-16 to 2017-01-17."
## [1] "Using test set from 2017-01-17 to 2017-02-02."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-01 to 2017-02-02."
## [1] "Using test set from 2017-02-02 to 2017-02-18."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-01-17 to 2017-02-18."
## [1] "Using test set from 2017-02-18 to 2017-03-06."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-02 to 2017-03-06."
## [1] "Using test set from 2017-03-06 to 2017-03-22."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-02-18 to 2017-03-22."
## [1] "Using test set from 2017-03-22 to 2017-04-07."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-06 to 2017-04-07."
## [1] "Using test set from 2017-04-07 to 2017-04-23."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-03-22 to 2017-04-23."
## [1] "Using test set from 2017-04-23 to 2017-05-09."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-07 to 2017-05-09."
## [1] "Using test set from 2017-05-09 to 2017-05-25."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-04-23 to 2017-05-25."
## [1] "Using test set from 2017-05-25 to 2017-06-10."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-09 to 2017-06-10."
## [1] "Using test set from 2017-06-10 to 2017-06-26."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-05-25 to 2017-06-26."
## [1] "Using test set from 2017-06-26 to 2017-07-12."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-10 to 2017-07-12."
## [1] "Using test set from 2017-07-12 to 2017-07-28."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-06-26 to 2017-07-28."
## [1] "Using test set from 2017-07-28 to 2017-08-13."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-12 to 2017-08-13."
## [1] "Using test set from 2017-08-13 to 2017-08-29."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-07-28 to 2017-08-29."
## [1] "Using test set from 2017-08-29 to 2017-09-14."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-13 to 2017-09-14."
## [1] "Using test set from 2017-09-14 to 2017-09-30."
## [1] "Cross validating strategy."
## [1] "Using train set from 2017-08-29 to 2017-09-30."
## [1] "Using test set from 2017-09-30 to 2017-10-16."
results <- results %>% 
  mutate(return_strategy_cumulative = cumprod(1 + return_strategy_change), 
         date_time = as.POSIXct(date_time, origin = "1970-01-01")) 
ggplot(results, aes(x = date_time)) + 
  geom_line(aes(y = return_strategy_cumulative), colour = "blue", size = 1) + 
  geom_hline(yintercept = 1, colour = "black") + 
  labs(title = "Strategy Return vs Buy Hold Return", x = "Date", y = "Cumulative Return") 

print(results[["return_strategy_cumulative"]][nrow(results)]) 
## [1] 1.366628